An Efficient Algorithm for Data Cleaning

نویسندگان

  • Payal Pahwa
  • Rajiv Arora
  • Garima Thakur
چکیده

The quality of real world data that is being fed into a data warehouse is a major concern of today. As the data comes from a variety of sources before loading the data in the data warehouse, it must be checked for errors and anomalies. There may be exact duplicate records or approximate duplicate records in the source data. The presence of incorrect or inconsistent data can significantly distort the results of analyses, often negating the potential benefits of information-driven approaches. This paper addresses issues related to detection and correction of such duplicate records. Also, it analyzes data quality and various factors that degrade it. A brief analysis of existing work is discussed, pointing out its major limitations. Thus, a new framework is proposed that is an improvement over the existing technique.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Heuristic Algorithm for Nonlinear Lexicography Goal Programming with an Efficient Initial Solution

In this paper,  a heuristic algorithm is proposed in order to solve a nonlinear lexicography goal programming (NLGP) by using an efficient initial point. Some numerical experiments showed that the search quality by the proposed heuristic in a multiple objectives problem depends on the initial point features, so in the proposed approach the initial point is retrieved by Data Envelopment Analysis...

متن کامل

Keyword query cleaning

Unlike traditional database queries, keyword queries do not adhere to predefined syntax and are often dirty with irrelevant words from natural languages. This makes accurate and efficient keyword query processing over databases a very challenging task. In this paper, we introduce the problem of query cleaning for keyword search queries in a database context and propose a set of effective and ef...

متن کامل

An Efficient Extension of Network Simplex Algorithm

In this paper, an efficient extension of network simplex algorithm is presented. In static scheduling problem, where there is no change in situation, the challenge is that the large problems can be solved in a short time. In this paper, the Static Scheduling problem of Automated Guided Vehicles in container terminal is solved by Network Simplex Algorithm (NSA) and NSA+, which extended the stand...

متن کامل

Cleaning Process with Efficient Allocation Scheme Improves Flash Memory Performance

Flash memory is a non-volatile storage device that offers lots of superiority features. However, it has two characteristics namely: 1) Out-place updating and 2) Cleaning process that affects its performance as an efficient storage sub-system. Both characteristics influence the access time requirement in enabling the continuity of data storing and updating. In this paper, we propose an efficient...

متن کامل

Accurate Fruits Fault Detection in Agricultural Goods using an Efficient Algorithm

The main purpose of this paper was to introduce an efficient algorithm for fault identification in fruits images. First, input image was de-noised using the combination of Block Matching and 3D filtering (BM3D) and Principle Component Analysis (PCA) model. Afterward, in order to reduce the size of images and increase the execution speed, refined Discrete Cosine Transform (DCT) algorithm was uti...

متن کامل

An Efficient Adaptive Boundary Matching Algorithm for Video Error Concealment

Sending compressed video data in error-prone environments (like the Internet and wireless networks) might cause data degradation. Error concealment techniques try to conceal the received data in the decoder side. In this paper, an adaptive boundary matching algorithm is presented for recovering the damaged motion vectors (MVs). This algorithm uses an outer boundary matching or directional tempo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IJKBO

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2011